Skip to content

feat(tables): background jobs (delete/export/backfill on trigger.dev) + tenant-scoped query performance#4915

Merged
TheodoreSpeaks merged 45 commits into
stagingfrom
improvement/table-row-deletes
Jun 12, 2026
Merged

feat(tables): background jobs (delete/export/backfill on trigger.dev) + tenant-scoped query performance#4915
TheodoreSpeaks merged 45 commits into
stagingfrom
improvement/table-row-deletes

Conversation

@TheodoreSpeaks

@TheodoreSpeaks TheodoreSpeaks commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Background jobs, async exports, and a systemic performance pass for Tables.

Background row deletes + table_jobs

  • "Select all" deletes (filter + optional deselections) send { filter, excludeRowIds } and delete in keyset-paginated background batches; rows inserted mid-job are spared via a created_at cutoff.
  • New table_jobs table (one row per job, type = import | delete | export | backfill). Concurrency gate: partial-unique index on (table_id) WHERE status='running' AND type <> 'export' — one writing job per table; read-only exports run concurrently.
  • Read-path mask: while a delete job runs, every read (grid, exports, search) hides the doomed rows, so mid-job reads match the eventual result.

Jobs on trigger.dev

  • Import, delete, export, and output-column backfill run as trigger.dev tasks (apps/sim/background/) with an in-process runDetached fallback — jobs survive deploys/restarts.
  • Backfill is hybrid: ≤500 completed runs inline (response returns consistent), above that a background job.

CSV export

  • Hybrid by size: ≤10k rows keep the synchronous stream; larger tables run an async export job → storage → presigned download (auto-downloads for the initiating session).
  • Unified jobs tray (imports + exports) with aggregate state icon, live progress, cancel, and re-download.

Performance (measured on a 1M-row table in a 12M-row shared relation)

Path Before After
1M-row delete ~25 min seconds/page — (table_id, id) index (0231)
1M-row export ~30 min ~1 min — keyset pagination (OFFSET was O(N²))
Grid infinite scroll / drains OFFSET per page (order_key, id) cursor seek
Filtered count (ILIKE) 11–12.7s 0.6s
Sorted view, per page 9.7s + disk spills 0.76s
Cmd+F search 75s 2s
Equality filters / $in 326ms–1.1s 17ms — tenant-scoped GIN (0232)
Unique-constraint check (every write) 3.5s <1s

Root cause for most of these: jsonb predicates are unestimatable, so the planner picked parallel seq scans over the entire shared relation (every tenant's rows). Two mechanisms fix it:

  • withSeqscanOff (lib/table/planner.ts): transaction-scoped SET LOCAL enable_seqscan = off around the queries with unindexable predicates (ILIKE/range/lower()/sort keys).
  • Migration 0232: replaces the cross-tenant GIN on data with btree_gin (table_id, data jsonb_path_ops) — containment lookups resolve tenant-scoped inside the index (and the index is smaller, 529MB vs 694MB on dev).
  • Autovacuum tuned to 2% churn on the shared relation (0231) so one tenant's mass delete doesn't degrade reads for days.

Fit & finish

  • Error toasts surface the real failure (drizzle wraps DB errors; routes now classify on the root cause) — e.g. "Row limit exceeded — this table is capped at 10,000 rows" instead of "Failed to insert row".
  • Export tray icon appears immediately on kickoff; tray restored on the tables list page.
  • Import batches 5k rows (bind-param bounded), delete pages 10k.

Migrations

  • 0231_table_jobs_and_keyset: table_jobs + import-state data migration + (table_id, id) index + autovacuum tuning + import_* column drops.
  • 0232_tenant_scoped_data_gin: btree_gin extension + tenant-scoped containment index (replaces user_table_rows_data_gin_idx).

Test plan

  • bunx vitest run lib/table app/api/table app/api/v1 hooks/queries (395 tests)
  • bun run lint:check, tsc --noEmit, bun run check:api-validation
  • Manual: 1M-row table — select-all delete, async export + download, filtered/sorted scrolls, Cmd+F, unique-column inserts; trigger.dev path with TRIGGER_DEV_ENABLED=true; EXPLAIN ANALYZE before/after for each perf claim.

🤖 Generated with Claude Code

@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 12, 2026 6:43am

Request Review

@cursor

cursor Bot commented Jun 9, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Large surface area touching data deletion, job concurrency, storage cleanup, and API contract renames (import_*job_*); incorrect filter/mask logic could hide or delete the wrong rows.

Overview
This PR generalizes table background work around a table_jobs model (replacing per-table import_* fields with jobStatus / jobId / jobType) and wires import, delete, export, and backfill through markTableJobRunning with optional Trigger.dev dispatch and releaseJobClaim on failed kicks.

New async APIs: POST …/delete-async (filter + excludeRowIds + created_at cutoff), POST …/export-async, GET …/export/download, and GET /api/table/jobs for workspace export tray rows. Job cancel moves to POST …/job/cancel with typed SSE job events (exports auto-download when initiated this session).

Bulk table UX: Select-all row selection supports exclusions; delete/run/stop/cancel accept filter + excludeRowIds instead of loading every row id. Row listing adds after keyset cursor and filter-scoped rowTotal. Shared tableFilterError, rootErrorMessage, and rowWriteErrorResponse unify API error handling (including friendly row-limit toasts).

Ops: Stale-job cron marks stalled table_jobs failed, prunes terminal jobs after 24h, and deletes pruned export files from storage.

Reviewed by Cursor Bugbot for commit 4fc09f7. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/hooks/queries/tables.ts
@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR ships background-job infrastructure for Tables: a new table_jobs table with a partial-unique concurrency gate, trigger.dev dispatch for CSV import, filtered bulk delete, async export, and output-column backfill, plus a comprehensive query-performance pass (keyset pagination, withSeqscanOff, tenant-scoped GIN index, autovacuum tuning) measured to cut worst-case read times from minutes to under a second on a 1 M-row table.

  • table_jobs + runners: replaces the import_* columns on user_table_definitions with a proper jobs table, wires each job type to a trigger.dev task (with runDetached in-process fallback), and correctly patches the previously-noted ghost-job risk by releasing the claim in a catch block on dispatch failure.
  • Read-path delete mask: pendingDeleteMask hides doomed rows from grid reads, exports, and searches while a delete job is running, using IS NOT TRUE to safely handle NULL-valued JSONB predicates.
  • Keyset pagination: queryRows and the delete/export workers both switch from OFFSET to (order_key, id) / (position, id) cursor seeks, turning O(N\u00b2) full-table drains into O(page) index seeks.

Confidence Score: 5/5

Safe to merge — the concurrency gate, delete mask, keyset paging, and ghost-job fixes all look correct; the findings are optimization opportunities rather than blockers.

The export runner buffers the entire serialized file in memory before uploading, and pendingDeleteMask is re-queried on every export page rather than once per job. Both are optimization opportunities with no risk to data correctness or user-visible state, and the 1 M-row manual test suggests they work within practical limits for today's workloads.

apps/sim/lib/table/export-runner.ts — in-memory file accumulation will need attention before this export path is stressed with very wide or extremely large tables.

Important Files Changed

Filename Overview
apps/sim/lib/table/export-runner.ts New async export worker — pages rows correctly via keyset cursor, but accumulates the entire serialized file in a chunks array before uploading, which can exhaust container memory for wide tables in the runDetached path.
apps/sim/lib/table/delete-runner.ts New background delete worker — correctly keyset-paginated, cutoff-guarded, ownership-gated per page, and handles cancel/supersede cleanly.
apps/sim/lib/table/service.ts Major refactor: replaces import_* columns with a separate table_jobs table, adds keyset cursor support to queryRows, introduces pendingDeleteMask for mid-job read consistency, and wraps seqscan-prone queries in withSeqscanOff.
apps/sim/lib/table/planner.ts New withSeqscanOff helper — correctly uses SET LOCAL so the flag is transaction-scoped and self-cleaned.
packages/db/migrations/0233_table_jobs_and_keyset.sql Creates table_jobs with partial-unique concurrency gate, migrates existing import_* state safely, builds performance indexes CONCURRENTLY after an embedded COMMIT.
apps/sim/app/api/table/[tableId]/delete-async/route.ts Kick-off route for background deletes — correctly releases the job claim on trigger.dev dispatch failure, captures cutoff atomically before dispatch.
apps/sim/app/api/table/[tableId]/export-async/route.ts Async export kick-off — correctly releases job claim on trigger.dev failure; export jobs bypass the one-running-job gate as intended.
apps/sim/lib/table/backfill-runner.ts Extracted from service.ts; correctly uses the hybrid inline/background threshold and releases the job claim on trigger.dev failure.
apps/sim/hooks/queries/tables.ts Adds keyset cursor pagination with cursor-to-offset fallback, scopes the async-delete optimistic update to only the active filter view.
apps/sim/app/api/cron/cleanup-stale-executions/route.ts Janitor migrated from import_* columns to table_jobs rows — marks stale running jobs failed, prunes terminals after 24 h, cleans up orphaned export files.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Route as POST /delete-async
    participant Service as table service
    participant DB as PostgreSQL
    participant Worker as delete runner
    participant SSE as SSE stream

    Client->>Route: "{filter, excludeRowIds, estimatedCount}"
    Route->>Service: markTableJobRunning(tableId, jobId, 'delete', payload)
    Service->>DB: INSERT table_jobs ON CONFLICT DO NOTHING
    DB-->>Service: "claimed = true"
    Route->>Worker: trigger table-delete task
    Note over Route,Worker: on trigger failure to releaseJobClaim to throw
    Route-->>Client: "{jobId}"

    loop keyset pages
        Worker->>DB: updateJobProgress (heartbeat + ownership)
        DB-->>Worker: "owns = true"
        Worker->>DB: selectRowIdPage (table_id+id cursor, cutoff, filter)
        DB-->>Worker: page ids
        Worker->>DB: deletePageByIds
        Worker->>SSE: appendTableEvent(running, progress)
    end

    Worker->>DB: markJobReady(tableId, jobId)
    Worker->>SSE: appendTableEvent(ready, total)
    Note over Client,SSE: Grid reads use pendingDeleteMask to hide doomed rows
Loading

Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile

Comment thread apps/sim/hooks/queries/tables.ts Outdated
…ed optimistic clear, Cmd+A select-all, hide delete from tray)
…row-deletes

# Conflicts:
#	apps/sim/app/workspace/[workspaceId]/tables/[tableId]/table.tsx
#	apps/sim/app/workspace/[workspaceId]/tables/components/import-csv-dialog/import-csv-dialog.tsx
#	apps/sim/app/workspace/[workspaceId]/tables/tables.tsx
#	apps/sim/lib/table/import-runner.ts
#	apps/sim/lib/table/service.ts
#	packages/db/migrations/meta/0227_snapshot.json
#	packages/db/migrations/meta/_journal.json
Comment thread apps/sim/lib/table/service.ts
Comment thread apps/sim/lib/api/contracts/tables.ts Outdated
Comment thread apps/sim/lib/table/dispatcher.ts Outdated
Comment thread apps/sim/app/api/cron/cleanup-stale-executions/route.ts
Comment thread apps/sim/hooks/queries/tables.ts
Select-all minus deselected rows now means exactly that for every bulk
action, not just delete. runColumnBodySchema and cancelTableRunsBodySchema
accept excludeRowIds (bounded by MAX_EXCLUDE_ROW_IDS, select-all scope
only); the dispatch scope persists it and the dispatcher window walk,
eager bulk-clear, pre-run cancel, and filter/table-scoped cancel all skip
excluded rows. Client threads exclusions from the selection through the
action bar and the grid context menu, including the optimistic stamps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread apps/sim/app/api/table/import-async/route.ts
Comment thread apps/sim/lib/table/workflow-columns.ts Outdated
…lder table

Two Bugbot findings on the exclusion work:
- Select-all-minus-deselections Stop (no filter) cancelled every active
  dispatch table-wide, killing row-scoped runs on deselected rows.
  markActiveDispatchesCancelled now spares dispatches whose scope.rowIds
  are fully contained in the exclusion set (coalesce(false) keeps
  table-wide dispatches cancellable).
- Create-mode import: a failed trigger.dev dispatch released the job
  claim but left the just-created placeholder table in the workspace.
  Archive it on the failure path (no hard-delete surface exists).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…row-deletes

Migration renumbered 0231 -> 0232 (staging took 0231) and reworked for
the new CI-driven migration runner:
- expand/contract: the import_* column drops are removed — the previous
  app version still reads them while migrations apply from CI; a later
  release drops them (data migration gains ON CONFLICT DO NOTHING and
  the whole file is replay-idempotent per the runner convention)
- user_table_rows indexes (keyset btree + tenant-scoped GIN) build via
  COMMIT + CREATE INDEX CONCURRENTLY so the write-hot shared relation is
  never write-blocked; old GIN dropped CONCURRENTLY after the new one
Also unions the rows-route imports with staging's row-wire translators
and re-baselines the route count (811 staging + 2 export routes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A mid-delete refresh resurrected the old counts: the optimistic update
stripped cached rows but left page-0 totalCount (footer / select-all
label) at the old total, and list/detail counts reported raw row_count
including doomed-but-not-yet-deleted rows.

- onMutate now sets the active view's totalCount to the kept rows and
  decrements the cached detail rowCount by the doomed estimate
- the kickoff persists that estimate on the job (payload.doomedCount,
  clamped server-side); getTableById/listTables subtract the
  not-yet-deleted remainder (doomedCount - rows_processed) while the
  delete runs, so refetched counts match the read path's delete mask

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
toast.success('Export started — the download will begin when it finishes')
} else {
await downloadTableExport(tableData.id, tableData.name)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Export callback missing hook dependency

Low Severity

handleExportCsv calls exportTableAsync.mutateAsync but its useCallback dependencies list only tableData and workspaceId, not exportTableAsync (from useExportTableAsync declared later). After navigation or hook identity changes, export can run against a stale mutation bound to the wrong table.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4118766. Configure here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive per repo convention (.claude/rules/sim-queries.md): TanStack v5 mutateAsync is stable and always invokes the current mutationFn — the mutation observer reads the latest options on each call, so a memoized closure over an older mutation object still targets the current tableId. The rules explicitly say not to add mutation objects to useCallback deps. Additionally tableData is in the deps and changes identity on table navigation, refreshing the closure anyway.

The header checkbox lingered as a minus over the optimistically-emptied
grid: rowSelectionCoversAll treats zero rows as not-covered, and the
selection clear waited for the kickoff's onSuccess. Clear at click
(failed kickoffs visibly restore rows + toast; re-selecting is cheap)
and render an empty grid's header checkbox unchecked regardless — a
selection over zero rows is vacuous.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread apps/sim/app/workspace/[workspaceId]/tables/[tableId]/table.tsx
The sync/async export choice reads rowCount, which is a doomed-estimate-
adjusted number during a running delete (and the estimate is client-
supplied) — an overstated estimate could route a still-large masked set
through the synchronous stream. Mid-delete exports now always run as a
job: safe at any size, and exports bypass the one-job-per-table gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…row-deletes

Migration renumbered 0232 -> 0233 (staging took 0232 for BYOK keys);
snapshot regenerated, hand-written SQL preserved, zero drift.
checkUniqueConstraintsDb reconciles staging's executor param (pool
self-deadlock fix #4975) with the tenant-bounded planner flag: own
transaction only when given plain db, SET LOCAL on the caller's
transaction otherwise. process-contents test keeps relying on global
mocks (now incl. dbReplica). Route baseline 815 (+2 staging tools).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…raphs

next build (Turbopack) failed with "Two or more assets with different
content were emitted to the same output path" on the server-root chunk.
Root cause: setup.server.ts's unscoped path.resolve(process.cwd()) made
node-file-tracing sweep the entire project — next.config.ts included —
into every route graph reaching lib/uploads (the files/upload route and,
since the export job, the export-async path). Two producers emitted the
swept config into same-named chunks; staging's latest commits made their
contents diverge and the names collided. Annotate the path derivation
with turbopackIgnore per the NFT warning's own remediation — the build
passes and all ~390 "unexpected file in NFT list" warnings disappear.

Also inline the releaseJobClaim dynamic imports in the kickoff routes to
plain static imports — service is already statically imported there.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

…row-deletes

Route baseline 816 (integrations' routes + our two export routes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4fc09f7. Configure here.

Comment thread apps/sim/hooks/queries/tables.ts
@TheodoreSpeaks TheodoreSpeaks merged commit 53fdcab into staging Jun 12, 2026
15 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the improvement/table-row-deletes branch June 12, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant